Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCP: /api/v1/regions optimization #1970

Merged
merged 14 commits into from
Dec 2, 2019
Merged

Conversation

ekalinin
Copy link
Contributor

@ekalinin ekalinin commented Nov 25, 2019

What problem does this PR solve?

PCP #1837

What is changed and how it works?

convertToAPIRegions optimization

  • reduced the amount of allocations in convertToAPIRegions by making a slice of RegionInfo.
  • added a new function - InitRegion. Used it in a loop to init a slice
    of pointers to the api.RegionInfo structs without additional allocations.
  • added a method HexRegionKeyStr. It returns a string (and not a []byte).
    StartKey/EndKey fields are string in the api.RegionInfo, and we do not need one more convertation (and allocation) from []byte to string.
  • added benchmarks:
    • BenchmarkConvertToAPIRegions
    • BenchmarkHexRegionKey
    • BenchmarkHexRegionKeyStr

Result:

# GO111MODULE=on go test github.com/pingcap/pd/server/api -run=None -bench=BenchmarkConvertToAPIRegions -benchmem > new.api.txt
➜ benchcmp old.api.txt new.api.txt
benchmark                           old ns/op     new ns/op     delta
BenchmarkConvertToAPIRegions-12     2880085       2208890       -23.30%

benchmark                           old allocs    new allocs    delta
BenchmarkConvertToAPIRegions-12     82182         52183         -36.50%

benchmark                           old bytes     new bytes     delta
BenchmarkConvertToAPIRegions-12     2162007       1947634       -9.92%

h.rd.JSON optimization

  • added server/api/router.go#createRender function for creating *render.Render
  • removed option IndentJSON from the render.New call
  • added option StreamingJSON to the render.New call
  • added server/api/region_test.go#BenchmarkRenderJSON for testing the performance of the last part of the regionsHandler.GetAll function (which takes 90% of the total time for the /pd/api/v1/regions endpoint in PCP PCP-24: Improve the performance of the HTTP API for getting all regions #1837)
# GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkRenderJSON -benchmem > new.txt
➜ benchcmp old.txt new.txt
benchmark                  old ns/op     new ns/op     delta
BenchmarkRenderJSON-12     662659672     147632381     -77.72%

benchmark                  old allocs    new allocs    delta
BenchmarkRenderJSON-12     50            12            -76.00%

benchmark                  old bytes     new bytes     delta
BenchmarkRenderJSON-12     414771916     51753928      -87.52%

Check List

Tests

  • Unit test
  • Integration test

@sre-bot
Copy link
Contributor

sre-bot commented Nov 25, 2019

Thanks for your contribution. If your PR get merged, you will be rewarded 900 points.

@nolouch
Copy link
Contributor

nolouch commented Nov 26, 2019

Thanks, @ekalinin, Greatly work.
In the origin issue, the convertToAPIRegions also allocated much memory, can we do something for this?
image

@ekalinin
Copy link
Contributor Author

In the origin issue, the convertToAPIRegions also allocated much memory, can we do something for this?

Sure, will take a look on it today.

- reduced the amount of allocation in convertToAPIRegions by making a
slice of RegionInfo.
- added a new function - InitRegion. Used it in a loop to init a slice
of pointers to the RegionInfo structs without additional allocations.
- added method HexRegionKeyStr. It returns string (and not []byte).
StartKey/EndKey are string, and we do not need one more convert (and
allocation) from []byte to string.
- added benchmarks:
  - BenchmarkConvertToAPIRegions
  - BenchmarkHexRegionKey
  - BenchmarkHexRegionKeyStr

Result:

➜ benchcmp old.api.txt new.api.txt
benchmark                           old ns/op     new ns/op     delta
BenchmarkConvertToAPIRegions-12     2880085       2208890       -23.30%

benchmark                           old allocs     new allocs     delta
BenchmarkConvertToAPIRegions-12     82182          52183          -36.50%

benchmark                           old bytes     new bytes     delta
BenchmarkConvertToAPIRegions-12     2162007       1947634       -9.92%

➜ GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkHexRegionKey -benchmem
goos: linux
goarch: amd64
pkg: github.com/pingcap/pd/server/api
BenchmarkHexRegionKey-12       	 1229766	      1239 ns/op	     240 B/op	       5 allocs/op
BenchmarkHexRegionKeyStr-12    	 1226398	      1018 ns/op	     144 B/op	       3 allocs/op
@ekalinin ekalinin changed the title PCP: json api, drop indent, add streaming PCP: /api/v1/regions optimization Nov 26, 2019
@codecov-io
Copy link

codecov-io commented Nov 26, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@a87c9eb). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master   #1970   +/-   ##
========================================
  Coverage          ?   77.9%           
========================================
  Files             ?     174           
  Lines             ?   17626           
  Branches          ?       0           
========================================
  Hits              ?   13732           
  Misses            ?    2834           
  Partials          ?    1060
Impacted Files Coverage Δ
server/core/region.go 91.49% <100%> (ø)
server/api/router.go 99.19% <100%> (ø)
server/api/region.go 77.25% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a87c9eb...fd26ae2. Read the comment docs.

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks @ekalinin, you have done several significant optimizations, but may we can still try more for :

  • Conversion between []byte and string may have memory allocation, you have added a function HexRegionKeyStr to reduce one conversion, but there still have some conversions. maybe we can try this method to do something.
  • strings.Upper() will do a copy, we can improve it to zero-copy with doing it in slice[]byte.
  • hex.EncodingToString also have a conversion.

server/api/region_test.go Outdated Show resolved Hide resolved
server/api/region_test.go Outdated Show resolved Hide resolved
server/api/region_test.go Outdated Show resolved Hide resolved
server/api/region.go Outdated Show resolved Hide resolved
ekalinin and others added 2 commits November 27, 2019 14:25
- tRegions: removed args, used predefined args
- added Slice/String functions for zero-cost convertations
- added ToUpperASCIIInplace function (zero cost ToUppper)
- added EncodeToString. It overrides hex.EncodeToString. Difference: returns []byte, not string
@ekalinin
Copy link
Contributor Author

ekalinin commented Nov 27, 2019

Changes:

  • region_test.newTestRegions: removed args, used predefined args
  • added String function for zero-cost conversation from []byte to string
  • added ToUpperASCIIInplace function (zero cost copy of ToUppper)
  • added EncodeToString. It overrides hex.EncodeToString. Difference: returns []byte, not string

Results:

# GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkHexRegionKey -benchmem

➜ benchcmp hex.old.txt hex.new.txt 
benchmark                       old ns/op     new ns/op     delta
BenchmarkHexRegionKey-12        532           137           -74.25%
BenchmarkHexRegionKeyStr-12     470           140           -70.21%

benchmark                       old allocs     new allocs     delta
BenchmarkHexRegionKey-12        5              1              -80.00%
BenchmarkHexRegionKeyStr-12     3              1              -66.67%

benchmark                       old bytes     new bytes     delta
BenchmarkHexRegionKey-12        400           48            -88.00%
BenchmarkHexRegionKeyStr-12     240           48            -80.00%

and in general:

➜ benchcmp old.api.txt new.api.txt
benchmark                           old ns/op     new ns/op     delta
BenchmarkConvertToAPIRegions-12     2880085       1550980       -46.15%

benchmark                           old allocs     new allocs     delta
BenchmarkConvertToAPIRegions-12     82182          20003          -75.66%

benchmark                           old bytes     new bytes     delta
BenchmarkConvertToAPIRegions-12     2162007       2002431       -7.38%

@ekalinin
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the test same as the original issue, with 1,000,000 regions:
before

[2019/10/23 12:19:55.653 +00:00] [INFO] [region.go:154] ["[DEBUG] got all regions from core"] [cost=149.863003ms]
[2019/10/23 12:19:59.083 +00:00] [INFO] [region.go:157] ["[DEBUG] convert all regions to API regions"] [cost=3.430113877s]
[2019/10/23 12:20:17.575 +00:00] [INFO] [region.go:160] ["[DEBUG] send all regions to client"] [cost=18.491626948s]

after

[2019/11/27 15:15:14.985 +00:00] [INFO] [region.go:155] ["[DEBUG] got all regions from core"] [cost=31.286284ms]
[2019/11/27 15:15:16.834 +00:00] [INFO] [region.go:158] ["[DEBUG] convert all regions to API regions"] [cost=1.849013297s]
[2019/11/27 15:15:22.415 +00:00] [INFO] [region.go:161] ["[DEBUG] send all regions to client"] [cost=5.580648785s]

It's almost 3.5x faster now. but 5s may still too long, can we have more method to optimize it?
and I have got the profile from my test:
cpu.new.pprof.zip
mem.new.pprof.zip

maybe we need to focus on reducing CPU usage now:
image

  • how about have a try https://github.com/json-iterator/go to getting all regions?*
  • gzip the response may speed up transmission*

What is your opinion?

server/api/router.go Show resolved Hide resolved
@ekalinin
Copy link
Contributor Author

I do the test same as the original issue, with 1,000,000 regions:

Could you add a benchmark which would be close to that test with 1M regions?

It's almost 3.5x faster now.

Thanks! That's great!

but 5s may still too long, can we have more method to optimize it?
maybe we need to focus on reducing CPU usage now:

Sure, will try to dig into that direction.

We really need IndentJSON for human readable. can we keep them both at the same time?

Unfortunately, IndentJSON requires a lot of additional work (Marshal, then Indent):

https://github.com/golang/go/blob/8a5af7910a9b157c02736c3e0998a587bb8511c1/src/encoding/json/encode.go#L175-L186

Why just not use external tool? Like:

$ >> ./bin/pd-ctl -u http://0.0.0.0:32833 config show all | python -m json.tool

or

$ >> ./bin/pd-ctl -u http://0.0.0.0:32833 config show all | jq .

@nolouch
Copy link
Contributor

nolouch commented Nov 28, 2019

Could you add a benchmark which would be close to that test with 1M regions?

I have shared my data in here, you can use it to construct the regions in your local benchmark.

Why just not use an external tool? Like

pd-ctl is a tool that our users or DBAs often use it to diagnose problems. , we must ensure that it is easy to use. at lease, we just let fetch all regions to use StreamingJSON, others don't.

@ekalinin
Copy link
Contributor Author

at lease, we just let fetch all regions to use StreamingJSON, others don't.

Done.

@ekalinin
Copy link
Contributor Author

how about have a try https://github.com/json-iterator/go to getting all regions?*

Just tried it. Looks not well:

# current branch

➜ GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkRenderJSON
goos: linux
goarch: amd64
pkg: github.com/pingcap/pd/server/api
BenchmarkRenderJSON-12    	      96	  15344540 ns/op	 2429223 B/op	       5 allocs/op
PASS
ok  	github.com/pingcap/pd/server/api	1.732s

# current branch with github.com/json-iterator/go

GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkRenderJSON
goos: linux
goarch: amd64
pkg: github.com/pingcap/pd/server/api
BenchmarkRenderJSON-12    	      27	  38831651 ns/op	 6267826 B/op	      18 allocs/op
PASS
ok  	github.com/pingcap/pd/server/api	1.328s

patch:

func (ri *RegionsInfo) MarshalJSON() ([]byte, error) {
	var json = jsoniter.ConfigFastest
	return json.Marshal(*ri)
}

@ekalinin
Copy link
Contributor Author

Tried github.com/bet365/jingo as well. No luck again:

➜ GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkRenderJSON
goos: linux
goarch: amd64
pkg: github.com/pingcap/pd/server/api
BenchmarkRenderJSON-12    	      15	  77177199 ns/op	17316683 B/op	   50040 allocs/op
PASS
ok  	github.com/pingcap/pd/server/api	1.426s

patch:

var enc = jingo.NewStructEncoder(RegionInfo{})

func (ri *RegionInfo) MarshalJSON() ([]byte, error) {
 	buf := jingo.NewBufferFromPool()
 	enc.Marshal(ri, buf)

 	b := make([]byte, len(buf.Bytes))
 	copy(b, buf.Bytes)
 	buf.ReturnToPool()
 	return b, nil
}

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm. Thank you for several optimization attempts.

server/api/router.go Outdated Show resolved Hide resolved
@ekalinin
Copy link
Contributor Author

ekalinin commented Dec 1, 2019

Current results of the json render optimization:

# GO111MODULE=on go test github.com/pingcap/pd/server/api -run=Benchmark -bench=BenchmarkRenderJSON -benchmem

➜ benchcmp old.txt new.txt        
benchmark                  old ns/op     new ns/op     delta
BenchmarkRenderJSON-12     662659672     17829821      -97.31%

benchmark                  old allocs    new allocs    delta
BenchmarkRenderJSON-12     50            4             -92.00%

benchmark                  old bytes     new bytes     delta
BenchmarkRenderJSON-12     414771916     2339140       -99.44%

@nolouch
Copy link
Contributor

nolouch commented Dec 2, 2019

PTAL @rleungx

Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.what format when we get all regions? would you like to provide examples?

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nolouch nolouch added the status/can-merge Indicates a PR has been approved by a committer. label Dec 2, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 2, 2019

/run-all-tests

@sre-bot sre-bot merged commit 9923a25 into tikv:master Dec 2, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 2, 2019

@ekalinin complete task #1837 and get 900 score, currerent score 1450

@ekalinin ekalinin deleted the PCP-24/json-streaming branch December 2, 2019 08:02
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 5, 2019
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 5, 2019
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 5, 2019
Signed-off-by: Ryan Leung <rleungx@gmail.com>
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 5, 2019
Signed-off-by: Ryan Leung <rleungx@gmail.com>
nolouch pushed a commit that referenced this pull request Dec 5, 2019
… (#1986)

* /api/v1/regions optimization (#1970)

Signed-off-by: Ryan Leung <rleungx@gmail.com>
nolouch pushed a commit that referenced this pull request Dec 5, 2019
Signed-off-by: Ryan Leung <rleungx@gmail.com>
nolouch pushed a commit that referenced this pull request Dec 6, 2019
… (#1987)

* /api/v1/regions optimization (#1970)

* fix travis CI memory problem

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/can-merge Indicates a PR has been approved by a committer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants